244 research outputs found

    CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules

    Get PDF
    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class

    Predicting long-term publication impact through a combination of early citations and journal impact factor

    Full text link
    The ability to predict the long-term impact of a scientific article soon after its publication is of great value towards accurate assessment of research performance. In this work we test the hypothesis that good predictions of long-term citation counts can be obtained through a combination of a publication's early citations and the impact factor of the hosting journal. The test is performed on a corpus of 123,128 WoS publications authored by Italian scientists, using linear regression models. The average accuracy of the prediction is good for citation time windows above two years, decreases for lowly-cited publications, and varies across disciplines. As expected, the role of the impact factor in the combination becomes negligible after only two years from publication

    Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers

    Full text link
    Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights

    Common operation scheduling with general processing times: A branch-and-cut algorithm to minimize the weighted number of tardy jobs

    Get PDF
    Common operation scheduling (COS) problems arise in real-world applications, such as industrial processes of material cutting or component dismantling. In COS, distinct jobs may share operations, and when an operation is done, it is done for all the jobs that share it. We here propose a 0-1 LP formulation with exponentially many inequalities to minimize the weighted number of tardy jobs. Separation of inequalities is in NP, provided that an ordinary min Lmax scheduling problem is in P. We develop a branch-and-cut algorithm for two cases: one machine with precedence relation; identical parallel machines with unit operation times. In these cases separation is the constrained maximization of a submodular set function. A previous method is modified to tackle the two cases, and compared to our algorithm. We report on tests conducted on both industrial and artificial instances. For single machine and general processing times the new method definitely outperforms the other, extending in this way the range of COS applications

    A stochastic estimated version of the Italian dynamic General Equilibrium Model (IGEM)

    Get PDF
    We estimate with Bayesian techniques the Italian dynamic General Equilibrium Model (IGEM), which has been developed at the Italian Treasury Department, Ministry of Economy and Finance, to assess the effects of alter-native policy interventions. We analyze and discuss the estimated effects of various shocks on the Italian economy. Compared to the calibrated version used for policy analysis, we find a lower wage rigidity and higher adjustment costs. The degree of prices and wages indexation to past inflation is much smaller than the indexation level assumed in the calibrated model. No substantial difference is found in the estimated monetary parameters. Estimated fiscal multipliers are slightly smaller than those obtained from the calibrated version of the model

    LAF : Logic Alignment Free and its application to bacterial genomes classification

    Get PDF
    Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Alignment Free (LAF), a method that combines alignment-free techniques and rule-based classification algorithms in order to assign biological samples to their taxa. This method searches for a minimal subset of k-mers whose relative frequencies are used to build classification models as disjunctive-normal-form logic formulas (if-then rules). We apply LAF successfully to the classification of bacterial genomes to their corresponding taxonomy. In particular, we succeed in obtaining reliable classification at different taxonomic levels by extracting a handful of rules, each one based on the frequency of just few k-mers. State of the art methods to adjust the frequency of k-mers to the character distribution of the underlying genomes have negligible impact on classification performance, suggesting that the signal of each class is strong and that LAF is effective in identifying it.Peer reviewe
    • …
    corecore